Tag: array_unique
Profiling a data organizer
by James on May.18, 2009, under MySQL, PHP
From the past week or so i was building a data organizer, here is what it does:
- Collects data from various sources.
- Checks for duplicates, and stores for further checking.
- Spawns multiple threads to check the stored data.
Initially i used text files to store the data, and the data was checked for duplicates using array_unique() function.
This script took alot of memory because inorder to remove the duplicates correctly i needed every chunk of data to be stored in an array first. The constant file reading and writing took the cpu to 90% usage aswell.
I then used usleep() function to sleep the process inbetween and decrease the constant cpu usage, was able to bring it as low as 20% after making the script sleep for 1 milliseconds after collecting each data fragment. However this alone wasn’t helping and i still had to do something about the memory so i created a MySQL database and started storing data in it.
MySQL can check for duplicate entries and discard them, solving the memory issue.
Next i used MySQL query preparing and was able to speed up the scripts performance by 40%.
Conclusion
- Using MySQL is a good option if your dealing with a huge amount of data and you don’t want duplicates.
- If your going to do run a lot of INSERT, UPDATE or SELECT queries then preparing statements speeds up performance.
- usleep() calls can decrease the cpu load and is able to sleep the process in microseconds.
